Caged Shipments Prediction

Jan 2022 ~ FedEx Express

Length:   2mo (at 0.5 FTE)

Programming language:   Python (Pandas, NumPy, Matplotlib, Plotly, scikit-learn, LightGBM)

Data:   Over 600.000 shipped parcels having 742 features, including the target variable - whether the parcel was caged

Problem description:
Develop a LightGBM model for classifying whether a shipment will be caged (secured until the correct information is obtained)

Approach:
Since the data was cleaned previously, the first step involved slicing it into train and test sets. Then, a LightGBM model was created, and its hyperparameters were first tuned through a Random Search. The options in the grid were selected according to the official documentation and considering the balance between overfitting, performance, and training speed. Afterward, the best parameters obtained from the Random Search were put in a new grid next to other values in their close vicinity. The respective grid was used to optimize the hyperparameters one more time through Grid Search. Using the best hyperparameters followed from Grid Search, the model was fitted another time until the learning curve of the validation data reached a plateau to see after how many iterations the loss converges. Finally, the model was fitted on both the train and validation data.

Results:
An accuracy of 81% was observed after evaluating the predictions on the test data. However, the company uses a custom, greater-is-better score for this problem called the clearance score. The LightGBM achieved a clearance score of 2.08, outperforming the deployed Neural Network by 53%. Moreover, the LightGBM model was trained in only 45 seconds, while the Neural Network took approximately 22 minutes. Accordingly, the former was 29 times faster.

Benchmark Table

  • Address

    Amsterdam, the Netherlands